Index Term
   HOME

TheInfoList



OR:

In
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
, an index term (also known as subject term, subject heading, descriptor, or keyword) is a term that captures the essence of the topic of a document. Index terms make up a
controlled vocabulary Control may refer to: Basic meanings Economics and business * Control (management), an element of management * Control, an element of management accounting * Comptroller (or controller), a senior financial officer in an organization * Controllin ...
for use in
bibliographic record A bibliographic record is an entry in a bibliographic index (or a library catalog) which represents and describes a specific resource. A bibliographic record contains the data elements necessary to help users identify and retrieve that resource, as ...
s. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
. A popular form of keywords on the web are tags, which are directly visible and can be assigned by non-experts. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with
subject indexing Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are ''about'', to summarize their contents or to increase findability. In other words, i ...
or automatically with
automatic indexing Automatic indexing is the computerized process of scanning large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and using those controlled terms to quickly and effectively index large electronic document de ...
or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned. Keywords are stored in a
search index Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and ...
. Common words like
articles Article often refers to: * Article (grammar), a grammatical element used to indicate definiteness or indefiniteness * Article (publishing), a piece of nonfictional prose that is an independent part of a publication Article may also refer to: G ...
(a, an, the) and conjunctions (and, or, but) are not treated as keywords because it's inefficient. Almost every English-language site on the Internet has the article "''the''", and so it makes no sense to search for it. The most popular search engine,
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
removed
stop words Stop words are the words in a stop list (or ''stoplist'' or ''negative dictionary'') which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are insignificant. There is no single universal list ...
such as "the" and "a" from its indexes for several years, but then re-introduced them, making certain types of precise search possible again. The term "descriptor" was by
Calvin Mooers Calvin Northrup Mooers (October 24, 1919 – December 1, 1994), was an American computer scientist known for his work in information retrieval and for the programming language TRAC. Early life Mooers was a native of Minneapolis, Minnesota, atte ...
in 1948. It is in particular used about a preferred term from a
thesaurus A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea: Synonym diction ...
. The
Simple Knowledge Organization System Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the ...
language (SKOS) provides a way to express index terms with
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
for use in the context of the Semantic Web.


In web search engines

Most
web search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s are designed to search for words anywhere in a document—the title, the body, and so on. This being the case, a keyword can be any term that exists within the document. However, priority is given to words that occur in the title, words that recur numerous times, and words that are explicitly assigned as keywords within the coding. Index terms can be further refined using Boolean operators such as "AND, OR, NOT." "AND" is normally unnecessary as most search engines infer it. "OR" will search for results with one search term or another or both. "NOT" eliminates a word or phrase from the search, getting rid of any results that include it. Multiple words can also be enclosed in quotation marks to turn the individual index terms into a specific index ''phrase''. These modifiers and methods all help to refine search terms, to better maximize the accuracy of search results.CLIO. ''Keyword search''. Columbia University Libraries. Retrieved from http://www.columbia.edu/cu/lweb/help/clio/keyword.html


Author keywords

Author keywords are an integral part of literature. Many journals and databases provide access to index terms made by authors of the respective articles. How qualified the provider is decides the quality of both indexer-provided index terms and author-provided index terms. The quality of these two types of index terms is of research interest, particularly in relation to
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
. In general, an author will have difficulty providing indexing terms that characterize his or her document ''relative'' to other documents in the database.


Examples

* Canadian Subject Headings (CS) *
Library of Congress Subject Headings The Library of Congress Subject Headings (LCSH) comprise a thesaurus (in the information science sense, a controlled vocabulary) of subject headings, maintained by the United States Library of Congress, for use in bibliographic records. LC Subject ...
(LCSH) *
Medical Subject Headings Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences. It serves as a thesaurus that facilitates searching. Created and updated by the United States N ...
(MeSH) * Polythematic Structured Subject Heading System (PSH) * Subject Headings Authority File (SWD)


See also

* Index (publishing) * Keyword density * Subject (documents) *
Tag (metadata) In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found agai ...
*
Tag cloud A tag cloud (also known as a word cloud, wordle or weighted list in visual design) is a visual representation of text data, which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single word ...


References


Further reading

* {{Authority control Information retrieval techniques Thesauri